Automatic generation of phone sets and lexical transcriptions

نویسندگان

Rita Singh

Bhiksha Raj

Richard M. Stern

چکیده

Automatic Speech Recognition (ASR) systems that have even moderately large recognition vocabularies model these words as sequencesof subword units, or phonemes. The set of these phonemes, or the phoneset, forms the basic units that the ASR system is trained to classify. This set is usually small in size, consisting typically of about 40 phones for English. The ASR system uses a dictionary in which all the words in the system’s vocabulary are transcribed in terms of these phones. The phoneset and the dictionary are specific to a language and are designed manually by an expert. The performance of the ASR system is critically dependent on the accuarcy of the dictionary. In this paper we attempt to design the phoneset and the dictionary automatically, using only the training data and their transcriptions. In order to do this we jointly optimize the dictionary as well as the acoustic models for an evolving phoneset using a Maximum a posteriori (MAP) formulation for the optimization of the dictionary and a Maximum Likelihood (ML) formulation to optimize the acoustic models. Experimental results on the Resource Management (RM) corpus show that such an automatically derived phoneset results in recognition accuracies close to that obtained using a manually designed phoneset and dictionary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic generation of phonetic transcriptions for large speech corpora

We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Extracting true speaker identities from transcriptions

Automatic speaker diarization generally produces a generic label such a spkr1 rather than the true identity of the speaker. Recently, two approaches based on lexical rules were proposed to extract the true identity of the speaker from the transcriptions of the audio recording without any a priori acoustic information: one uses n-gram, the other one uses semantic classification trees (SCT). The ...

متن کامل

Use of Graphemic Lexicons for Spoken Language Assessment

Automatic systems for practice and exams are essential to support the growing worldwide demand for learning English as an additional language. Assessment of spontaneous spoken English is, however, currently limited in scope due to the difficulty of achieving sufficient automatic speech recognition (ASR) accuracy. ”Off-the-shelf” English ASR systems cannot model the exceptionally wide variety of...

متن کامل

Faster time-aligned phonetic transcriptions through partial automation

A semi-automatic process for generating time-aligned transcriptions of speech data at the word and phone level is described. At each stage in the process, segment durations are estimated to generate approximate boundary markers, which are then corrected by hand. Corrections at one level are taken into account in the generation of boundaries for the next level, such that the error is reduced at ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Automatic generation of phone sets and lexical transcriptions

نویسندگان

چکیده

منابع مشابه

Automatic generation of phonetic transcriptions for large speech corpora

Automatic Construction of Persian ICT WordNet using Princeton WordNet

Extracting true speaker identities from transcriptions

Use of Graphemic Lexicons for Spoken Language Assessment

Faster time-aligned phonetic transcriptions through partial automation

عنوان ژورنال:

اشتراک گذاری